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The string theory inspired Worldline Numerics approach to Casimir force calculations has some 
favourable characteristics that might make it well suited for geometric optimization problems as they 
arise e.g. in NEMS device engineering. We explain this aspect in detail, developing some refinements 
of the method along the way. Also, we comment on the problem of generalizing Worldline Numerics 
from scalars to photons in the presence of conductors. 

I 1 Introduction 

^ Casimir forces, i.e., electromagnetic forces arising due to quantum effects between uncharged conduc- 

tors at distances much larger than characteristic atomic radii, have first been predicted via theoretical 
considerations in 1948 \Xi, and have since been verified experimentally to 1% accuracy pj. While 
T-H initially a fringe subject - albeit unfortunately one that drew considerable attention from pseudo- 

^ scientific authors - research interest in Casimir forces strongly increased in particular in the last 

decade, and a number of novel theoretical tools were developed O lU O that allow the study of 
the Casimir effect in considerably more involved situations (geometries, material properties, tem- 
perature) than earlier investigations. At present, one major obstacle to research is that Casimir 
force calculations often still are computationally very demanding. Nevertheless, the development of 
theoretical tools and methods must go hand in hand with progress in nanoscale manufacturing, for 
it is clear that a sound understanding of the role of Casimir forces in nano machines will become 
increasingly important as we learn to manufacture on shorter length scales. 
' _ ^ One approach to the calculation of Casimir forces is based on the "Worldline approach" developed 

^ by Gies, Klingm filler, Langfeld and Moyaerts [TKHIH]. While this mostly has been used to study a 

r ""j simplified field theoretic model with massles scalars instead of vector gauge bosons (photons) , and 

nowadays alternative methods are available to directly calculate electrodynamic effects even with 
^ frequency-dependent optical properties of materials |9l [10] , the worldline approach is interesting for 

a number of reasons; 

• Due to the probabilistic nature of the method, it is sometimes computationally comparatively 
cheap (depending on the geometry) to obtain a rough estimate of Casimir forces. 

• With very little effort, the calculation can be modified in such a way that it simultaneously gives 
all the forces on a number of bodies, making it potentially attractive for problems requiring a 
geometric shape optimization approach. 

• Finally, it is formulated in a way that is suggestive of a remarkably intuitive interpretation. 
This may manage to make this conceptually subtle quantum effect somewhat accessible to wider 
audiences not necessarily deeply familiar with quantum field theory. (One should compare this 
with the didactic problems related to the "mesomeric effect" in chemistry.) 
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Figure 1: Beam bending under the influence of external forces and moments, a standard problem in Engineering, may be regarded 
as an energy minimization problem. Geometries such as the one shown above that involve bending beams at tens-of-nanometers 
length scales are presently being discussed as nanoelcctromechanical memory devices [llj. Here, the beam also serves as a floating 
gate that allows electronic read-out of its mechanical state (upward bent or downward bent). Discrctizing the beam into elements 
with one translational degree of motion, the nature of the Worldlinc numerics algorithm makes it easy to simultaneously calculate 
the Casimir forces on all elements with a computational effort less than one order of magnitude above the effort required to get a 
reliable estimate for the Casimir energy of the given configuration — independent of the number of elements. This is not easily 
achieved with a number of alternative methods to calculate Casimir forces. 



In this article, we will focus mostly on the second item in the list above, which is elucidated in 
detail in Section [2.5| The key insight is that a deformable object, such as a beam bending under the 
influence of Casimir forces as shown in figure [T] when discretized into A'^ elements (blocks) that are 
partially restricted in their relative motion, may require a certain computational effort, T processor- 
seconds, for a reasonable estimate of the Casimir energy of a given configuration, but will then always 
allow the simultaneous computation of Casimir forces (and moments) on all N elements in a way 
that requires at most 5T processor-seconds, irrespective of the number of bodies N. 



2 Computational Methods 

In contradistinction to other approaches, Worldline Numerics is remarkably simple to implement in a 
computer program, requiring almost no advanced software library infrastructure to deal with issues 
such as parallel sparse matrix linear algebra, multipole moments, or large matrix eigenvalues. Still, 
there are a number of useful improvements of the basic method that help to ensure making effective 
use of computational resources. As these are often linked to one another, it makes sense to address 
them in a coherent fashion. 



2.1 Monte Carlo integration over Loops 

The Casimir energy for a static geometry that can be modeled by a position-dependent potential V{x) 
is given as the quantum effective action per unit time: 
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The "Worldline Numerics" approach by Gies, Langfeld, and Moyaerts [12] is based on re-writing 
the logarithm in the effective action as an integral a la ln(p/g) — > Jy°° (exp(— pa;) — exp(— gx)) and 
re-expressing the operator trace as a Feynman path integral. This leads, for a real scalar field of 
mass m interacting only with the external potential V, to an expression for the effective action that 
is numerically tractable via Monte Carlo methods. The key expression from ^12j, which we repeat 
here for the convenience of the readers, is: 

^^[^]--2(4^.C?^"'"^^/ H^WAv.^^n^-l] (2) 



where an UV cut-off regulator A has been introduced. Here, the expectation value {■)y is the ensemble 
average over all closed loop (c.l.) gaussian random walks y : [0; 1] i— >■ R, y{Q) = j/(l) of "Wilson loops" 
re-scaled to proper time T. Let the statistical weight of the loop y be 

p[y]=e^v(^- j'^\ty{tf/^ (3) 

then: 

/w r Sy^,.T^vWv[y;x,T]p[y] 
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where Wv depends on the path y and on position shift x and propertime T": 

Wvly;x,T]^exp(^^T dtV{x + VTy{t))y (5) 

From this expression for the effective action of a free scalar interacting with a potential, Casimir 
forces can be obtained by using the position dependency of the potential V{x) to model the geometry, 
and calculating energy changes associated with changes to the geometry. 

While the applicability of this model for the calculation of real Casimir forces is questionable (even 
for perfect conductors) as the physics of photons is quite different from that of a scalar field, the 
remarkable conceptual simplicity of the above expressions certainly warrants a deeper investigation of 
its properties and potential utility, for it might actually allow a (yet undiscovered) generalization to 
the photon case. For the electromagnetic case, one would naturally want to start with investigations 
of perfect conductor surfaces, and the (obvious) scalar pendant of this is a potential V{x) that 
suppresses all quantum fluctuations inside the given bodies. It is not difficult to see that one may 
alternatively restrict the potential to have non-zero values only close to the surfaces of bodies, taking 

V{x) = \ j d^a5^{x-x^) (6) 

and considering the limit A — )■ oo. Then, Wv[y',x,T\ reduces to: 

1 Loop pierces a surface 
Loop does not pierce a surface. 



exp 



-T [ dtV{x + VTyit)) 
Jo 



Substituting s — \/r to eliminate the square root, and using translation invariance of the integral 
to ensure all loops have center of gravity at the origin, the expression for the geometry-dependent 
regularized Casimir energy then is: 



L,/JJ(e.(.,,.)). (8) 
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where the mean value is over all unit loops, sy is the loop y{t) scaled pointwise around its center of 
gravity y by the factor s, and 

„ , , f Loop does not pierce a surface , , 

Qv{sy,x) = l ^ ^ (9) 

1 1 Loop pierces a surface. 

The problem with this approach is that the Casimir energy attributed to the surface of any 
single body goes to infinity as we send the energy regulator A to infinity, due to the contribution 
from very short loops close to the surface. (This is, of course, a non-physical artefact related to 
the "geometry much larger than atomic scales" approximation.) In order to predict Casimir forces 
between different objects we are only interested in the dependency of the energy on the relative 
position of these objects. Therefore, it makes sense to modify this scheme in a way that (i) the 
contribution of each loop is taken into account relative to a configuration in which all objects are 
at infinite separation from one another, and (ii) the Casimir energy contribution attributed to loops 
piercing only one object surface (hence, "belonging" to that object) is taken as zero. 

Consider n objects with potentials V\, . . . , Vn- Then the total potential is = Vi + V2 + ■ ■ ■ + Ki 
and we use the freedom to shift the absolute energy level to define the "interaction Casimir energy" E 
as in [12] as the energy difference relative to a configuration in which every body is at an effectively 
infinite distance from every other body: 



Using this, we get 



with O given by 



E = (r[v] - r[Vi] - r[V2] r[v;.]) /Ar (lo) 



1 roo 1 

i5^-(^/_,(e(..,.)>. (11) 



I Re-scaled loop does not pierce any surface 

6(sy, a;) = < (12) 

II — n Re-scaled loop pierces the surfaces of n > 1 objects 

If n objects come close to one another, every loop that pierces all of them can be regarded as 
the image of n loops, each to be considered as being attached to (and moving with) that body. 
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Hence, when objects are in proximity, we count a loop once that would have been counted n times 
instead for separated objects. (Note that the counting weight of both a loop that pierces no surface, 
and a loop that pierces only one surface, is zero.) If the objects are now spatially separated the 
integral J^f^ ^{0{sy,x))y is finite and well behaved for A — > 0, so we can safely set A = 0. One is 
easily convinced that this is indeed the correct expression by considering a simple geometry (such 
as two parallel flat slabs) and requesting that the Casimir force does not change if one object is 
instead thought of as being made of two adjacent bodies. The counting weights are dictated by the 
convention for the "zero energy" configuration. 

For each loop y, the weight Q, as a function of the re-scaling factor s, is piece-wise constant. 
The s-integral hence can easily be performed analytically. Rather than being only a convenient sim- 
plification that saves computing time, this property plays a crucial role for the efficient simultaneous 
computation of multibody forces, cf. Section [2. 5| 



2.2 Loop Generation 

When trying to evaluate Equation |11| one naturally would try to discretize the loop as consisting 
of a finite number of straight sections. Taking the procedure literally, the presence of complicated 
curved geometries would mandate computationally fairly expensive ray-surface intersection checks. 
In many cases, a better investment of the computational effort may be to instead make the number 
of discretization points on the loop sufficiently large to ensure that simple inside/outside checks 
applied to each point give a reasonably close approximation. Still, generic ray/surface intersection 



checks can become useful, especially if the complicated multiple integral in Equation 11 (over loop 
shapes, loop sizes, and loop centers of gravity) can partially be evaluated by analytic, or rather semi- 
analytic means that involve numerical approximation of integral boundaries. This is relevant for the 
discussion in Section [2.4[ and may make major computational improvements of the method possible. 
In this context, we want to point out the existence of advanced algorithms useful for ray/surface 
intersection problems, such as Comba and Stolfi's Affine Arithmetic |131I14| . 

In order to generate a properly distributed random sample of loops, we first generalize the problem 
to finding a process that produces piecewise straight paths with gaussian length distribution (of given 
standard deviation) for a given starting and end point (not necessarily coincident). This problem 
can be re-phrased as finding pairs of such paths, each with its own starting point but at first without 
any constraint on their endpoints, and imposing the condition that they meet at their endpoints. 
Concatenating the first path to the reverse of the second solves the problem of finding a path with the 
correct distribution between two given points. One easily sees that the distribution of the midpoint 
is still gaussian (being the product of two gaussian distributions) . Hence, we can sample a loop by 
recursively sampling an intermediate point in the interval between a given start and end point. 

This method, known as the "d ('doubling') loop algorithm" [15], manages to generate closed 
loops with the desired distribution with very little effort. A slight problem of this approach is that 
it only generates loops for which the number of vertices is a power of two, but as one usually is only 
interested in getting below a given resolution, and this method will at worst require an effort too large 
by a factor 2, this is practically irrelevant. Nevertheless, the reasoning presented above is readily 
generalized to non-equal subdivisions (e.g. 1/3 plus 2/3), and from there to also allow recursive 
subdivision into other numbers of parts, but this is typically not needed. In order to generate a loop 
containing 10 points using the extended d-loop method, one would first determine the opposite point 
(point #6) of the starting point (point #1), then for both arcs use a 2 : 3 split of the associated 
variance budget and work out the gaussian distribution of the corresponding intermediate point's 
position. Sub-division of the arc of length 2 is straightforward, while the length 3 arc would first be 
split 1 : 2 using the same approach. 

Rather than choosing starting points randomly in space and then determining the location of the 
halfway-round-the-loop point, it makes sense to perform stratified spatial sampling on a lattice. To 
do so, we choose the first point to be the lattice point and take the halfway point to be gaussian 
distributed with mean the first point and standard deviation a characteristic length. We then continue 
sampling a loop of that length and later scale it around the midpoint between the first point and the 
halfway point appropriately to obtain a unit loop. In that way, the midpoint is gaussian distributed 
around the lattice point with a given length scale. 

If we want unbiased integration by taking loops for each lattice point, the lattice points have to 
be representative for loops sampled in their vicinity, with a characteristic length being that of the 
grid. This is certainly true if the grid is fine compared to any characteristic length of the geometry. 
However, the same can be achieved for arbitrary grids, if we take the characteristic length in the just 
described stratified sampling to be that of the grid. 
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A different kind of lattice effects has to be taken into account for methods computing a force 
as a difference in energy for two given geometries. Such methods would typically put a fixed set of 
loops on each lattice point and add up their energy contributions. Then they would do the same for 
the same geometry with one object moved in a particular direction. The difference in energy is then 
proportional to the force component on the moved object in the given direction. 

To focus on the net effect, as we do with symmetries (see |2.7[ ), one typically would use the same 
set of loops for both geometries. Also, to have for each loop a corresponding shifted loop, the amount 
the object is moved has to be a multiple of the grid length (in this direction). While doing otherwise 
would not necessarily yield a bias, doing so significantly improves the convergence speed of this 
method. 

For our purposes, we typically only ask simple questions about each loop, such as "which objects 
does it hit?", or (at most) "For what scaling intervals does this loop, centered at Xcm but re-scaled 
in size, hit object OnT\ In order to answer these, only very little information needs to be stored 
when visiting the loop point by point. So it is possible to implement the relevant algorithms in such 
a way that the loop is generated on the fly, and we never have to store the entire loop in memory - 
the number of points we have to remember is about the binary logarithm on the loop length. This 
yields an algorithm with a very small memory footprint and attractive characteristics for computing 
architectures that emphasize a high degree of parallelism between very simple cores. 

2.3 Numerical Integration over the Scaling Factor 

One approach to obtain energies — and so ultimately, by comparing energies for different geometric 
configurations, forces — is to directly evaluate the integral in Equation [Tl] numerically. Naively, one 
would have to, for various values of s, estimate {Q{sy,x))y and summing up. Since the order of 
summing up does not matter, we can as well compute the expectation value of the following process. 

Choose s uniformly at random from the interval [a,b]. Randomly generate a loop sy of 
size s and count (1 — n)/s^ if the loop hits n>2 objects, and otherwise. 

Here [a, b] is an interval big enough so that integrating over that interval does not differ noticeably 
from integrating over all positive reals. 

Looking at that random process more closely, one notes that the information about the random 
loop we use is the number of objects it hits. We have to pay particular attention to short loops 
that are just long enough to barely touch multiple objects, for they give the largest contribution to 
the sum. One should note that it is not possible to attribute a useful physical meaning to absolute 
differences in loop scaling factors s: for a loop that hits (at least) two objects, the effect of changing s 
to s + O.l very much depends on what the magnitude of s is. As relative changes of the scaling factor 
hence are more important than absolute changes, we much prefer a distribution, when sampling 
loops, that handles all orders of magnitude equally. In other words, we prefer a distribution where 
the logarithm of s is uniformly distributed on [ln(a), ln(6)]. 

When changing the distribution of s, we also have to transform the weight attributed to each 
sample accordingly. Taking the logarithm of s to be uniformly distributed, rather than s itself, each 
value s will be 1/s times as likely as before. To still get the same expectation, we have to multiply 
each value by s. Hence, we are finally left with estimating the expectation of the following process: 

Chose a = Ins uniformly at random in the interval [ln(a), In(fo)]. Randomly generate a 
loop of size e"^ and count (1 — n)e~'''^ = (1 — n)s~'^ if loop hits n > 2 elements, and 
otherwise. 

2.4 Symbolic Integration over the Scaling Factor 

It makes sense to try to perform at least part of the integration needed to evaluate eq. [2]symbolically, 
for two independent reasons. While this may on the one hand help to simplify the problem, it also 
gives us a much more useful handle on problems that involve changing geometries. As we are much 
more interested in Casimir forces (and moments) than just energies, this is obviously desirable. 

In particular, we can, as in [TB], typically perform the integration over the loop scaling factor 
X<=o 7i'd{sy,x) symbolically. 

If we have sampled a loop y, we can compute for each sampling point the values of s for which 
this sampling point is inside a given object. Often, this is just an interval, or at worst the union 
of a few intervals. By merging these intervals for each sampling point, we can compute the set 
of s values for which the loop hits the given object. Now, as O counts the number of objects hit 
by the loop, it is piecewise constant on the partitioning so obtained; if O = n for T G [a,b], we 
have Jl^ ji e{sy,x) = n{a-* - 6-*)/4. 
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Note that this means that we also do not need to specify the region of s which we want to sample, 
i.e. our method does not need to know a geometric length-scale. 

2.5 Forces on Multiple Bodies via Sensitivity Backpropagation 

In order to calculate forces, we have to determine by how much Casimir energies change when chang- 
ing the geometry. Taking the limit A — >■ oo, the subtraction scheme eq |10| is only compatible with 
geometry changes that change the position and orientation, but not shape, of individual bodies: evi- 
dently, the Casimir energy attributed to a single body via this scheme is zero, regardless of its shape. 
As we naturally would expect the Casimir force between two parallel flat plates to not disappear 
when connecting them with a thin wire (so that they become a single object), the regularization 
prescription that amounts to attributing forbidden loops to specific objects cannot be compatible 
with (unconstrained) shape changes. 

Even if we limit ourselves to shifts and rotations of bodies, retaining eg. |10[ it makes sense to de- 
scribe geometry changes in a more general way. We hence consider the potential to be a function of 
multiple geometry parameters (positions and angles), e.g. V — V{ri, r2, r^,ri, r2, r^, a?, , 73 ; . . ■). 

If we perform integration over the scaling parameter s analytically for every loop and consider 
the intervals over which the subset of objects pierced by that loop does not change, then the inter- 
val endpoints, will become analytic functions of the geometry parameters - and so will the loop's 
contribution to the total Casimir energy. 

For any given loop, we have to perform a fairly simple computation, which in the end gives a 
single number. Furthermore, it is perfectly feasible to design the algorithm in such a way that at the 
end of the computation of the loop's contribution to the Casimir energy, we still remember all the 
intermediate values that entered that calculation. In such a situation, there is a standard method 
(in the sense of an algorithmic transformation on the program that calculates the loop energy) that 
allows a fast evaluation of the gradient with respect to all the geometry parameters. Irrespective of 
the number A'^ of such parameters, and the complexity of the intermediate expressions, it is possible to 
obtain the gradient to full numerical accuracy with at most 5 times the computational effort needed 
to calculate the scalar function (often even much less). In comparison, the naive direct method of 
calculating the gradient by comparing function values would require at least A' -|- 1 full evaluations 
of the function (and then only give a result with reduced numerical accuracy). 

The generic approach that makes this possible, which has become known under the names "au- 
tomatic/algorithmic differentiation", "sensitivity backpropagation", or "adjoint code" [171 I18| is 
essentially based on this idea: 

• Every intermediate result used in the calculation gets stored away for later use (i.e. none may 
be dropped or over- written) . 

• To each such intermediate quantity Ik stored, we associate a buffer that can store another 
number Ik , initialized to zero at the beginning of the program. Ultimately, these will each end 
up holding the answer to the question: "If, at the point when Ik became first known during 
the calculation of the function value, we interrupted the computation, changed that value from 
Ik to I'k = Ik + e, and then allowed the computation to proceed without further modifications 
now using I'k instead of Ik, by how much would the final result then change, relative to e (in 
the limit of small e)?". Hence, the Ik will eventually become sensitivities that describe the 
dependence of the result on the given intermediate quantity. 

• Treating input parameters in the same way as intermediate values, the sensitivities on all the 
input values give the function's gradient. 

• Sensitivities are calculated in a two-step process: first, the function is evaluated once to obtain 
all the intermediate quantities for the given choice of input parameters. Then, starting from 
the result, which has been obtained from an arithmetic operation involving intermediate values, 
sensitivities for the last intermediate get updated. As these again have been the result of some 
arithmetic operation, the sensitivities for the intermediate values they have been obtained from 
can be determined, etc. As one intermediate quantity may be used multiple times throughout 
the calculation, it is important to collect incremental contributions to sensitivities when going 
backwards through the computation. 

The theoretical maximum effort factor of 5 can be traced back to the effort required to handle 
multiplication/division of intermediate values. Evidently, if the sensitivity of the result on the inter- 
mediate quantity Ik is known in Ik, and Ik was obtained as the sum of U -\- Ij, then the sensitivities 
of li and li must be increased by Ik (for if e.g. Ij gets used multiple times, then Ij receives multiple 
increments) . If, however, Ik is the product of /; and Ij , then li must be increased by Ij ■ Ik and vice 
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versa - the combined read/add/store operations give rise to a bounded multiplicative factor for the 
total effort. 

In practice, the corresponding calculations are even considerably simpler than what the general 
theory of algorithmic differentiation suggests, as the partial derivatives dE/dgj with respect to the 
geometry parameters gj are, at least for simple surfaces, obtainable directly as a by-product of the 
forward calculation. Comparing the Worldline method with the remarkable approach discovered by 
Rodriguez, Ibannescu, lannuzzi, Capasso, Joannopoulos, and Johnson [191 120] or for that matter 
any method that involves numerically solving discretized sparse linear operator equation systems, 
one would naturally not expect sensitivity backpropagation to be as readily applicable with these 
approaches as with Worldline Numerics. On the one hand, sensitivity backpropagating an iterative 
linear solver would require remembering the intermediate values from all iteration steps (otherwise 
thrown away), and also, the calculation may have happened to produce a solution before having 
explored some of the dependencies sufficiently well. While research has been done on backpropagating 
linear solvers, the incorporation of this strategy into Worldline Numerics undeniably is much easier 
to accomplish. 

2.6 Adaptive Sampling 

In a typical geometry, essentially the whole energy or force is contributed by few, comparably small 
regions. These are typically the regions where two objects come closely together. 

While we still have to sample loops in such a way that we integrate over all of the relevant region 
of space, it is worthwhile to focus effort mainly on these highly contributing areas, as the absolute 
uncertainty of our Monte-Carlo estimation is much higher there. We achieve this in the following 
way: We first specify an absolute accuracy to which we want the density estimated to at every point. 
When later sampling the density at a given point, we first take a specified minimum of samples. 
From that we estimate the (unbiased) variance of our sampling at this point. We continue sampling 
until a pre-defined (95%) confidence interval for the sampling mean is smaller than the prespecified 
accuracy. 

2.7 Living with Cancellation 

Some geometries, like the "cylinders with sidewalls" geometry studied in [21], show a high degree of 
symmetry. While perfect symmetry helps to reduce the computational effort as the calculation can 
be restricted to a fundamental domain, slightly non-symmetric configurations often are a problem if 
we want to compute the force on an object that gets pulled in different (perhaps opposing) directions: 
most of the contributions cancel, giving rise to a small residual force. 

A naive approach would compute the contribution at both sides of the object separately and 
then add up. This, however, would yield a huge variance for a comparably small resulting value. 
Fortunately, the force contribution of a loop and its mirrored image are highly correlated in these 
situations. Often, one is the negative value of the other. So we have a better way of estimating the 
contribution by estimating the expectation of the following process: 

Randomly pick a loop and also consider its mirror image under the symmetry; then add 
up the force contributions of both these loops. 

In that way, we do not change the expectation value of the sum, but, due to the correlation, 
the variance is much smaller. In that way, it is possible to compute force contributions where a 
naive approach would require excessive effort due the huge variation, as in the system discussed in 
Section [S] 

2.8 Parallelization 

In the Worldline formalism the calculation of contributions to the total energy (or force) can be 
performed independently for each grid point. Also, for each grid point, the contribution of each loop 
to the energy (or force) does not depend on the contribution of the other loops. Thus the problem is 
easily seen as being embarrassingly parallel, and furthermore the basic component - processing a loop 
- does not require overy complex calculations (in the sense of memory requirements and algorithmic 
effort) . As the worldline method is a probabilistic approach, the accuracy of the calculation can be 
increased (within reasonable limits dictated by computational effort) by increasing the number of 
grid points, number of loops, and the number of points per loop in an appropriate way. 

As the computation for each point and loop follows the same algorithm, this approach fits the 
SIMD (single instruction, multiple data) processing approach very well. Powerful massively parallel 



7 



while (stacksizc >0) { 

pop(StartPos, EndPos, &lcvcl); 
if (level > 0){ 

MidPos — (StartPos + EndPos) / 2 + gauss _nor m al ( , sigma ( 1 c v c 1 ) J 
push(MidPos, EndPos, level -1); 
push ( StartPos , MidPos, level -1); 
} 

else calc.contribution ( StartPos ) ; 

} 

Fi gUrC 2; Schematic structure of the code managing a stack of yet-to-be-split arcs. 



SIMD hardware is now available at a highly competitive price in the form of specialized Graphics 
Processing Units (GPUs), where development has - to a large extent - been driven by the video game 
industry. 

When implementing Worldline Numerics on GPU hardware, one has to bear in mind certain 
constraints of GPU programming. The memory hierarchy of most GPUs discerns between global 
memory and a special form of fast local memory called "shared memory". It is often favourable 
to perform most of the computations in shared memory. This fast shared memory typically is 
quite small and has to be divided between simultaneously running work items, thus limiting the 
number of loop points that can be computed concurrently. One also has to keep in mind that GPUs 
perform rather poorly on complex branching patterns and the hardware optimizations on GPUs 
mainly target a high throughput of floating point operations, often neglecting the performance on 
integer calculations needed for most Pseudo Random Number Generators (PRNGs). 

On account of shared memory size limitations, the authors developed a version of the d-loop 
algorithm that generates, for each loop, the loop points on the fly - without ever storing the entire 
loop in memory. As GPUs do not support recursion directly, it is advantageous to instead manage a 
stack of arcs yet to be split in half, as sketched in figure |2] 

This manages to reduce the memory footprint of loop generation and processing from 0{N) to 
C'(log(A'')), A'^ being the number of loop points. This makes it possible in principle to shift loop 
generation to GPU cores even for loops too large to fit into the shared GPU memory. In most 
cases, however, it seems more appropriate to instead pre-generate loop shapes on the CPU and 
subsequently upload them into GPU read-only memory visible to each GPU work item. The GPU 
threads then perform intersection checks for loops of pre-determined shape shifted to different grid 
points. Using the same set of loops at all grid points also helps in terms of statistics as there then 
can be direct cancellation between opposing forces arising from similar geometric structures. (See 



also the discussion in Section 2.7 



The authors so far used GPUs mainly with the direct numerical integration method described in 
section 2.3 and have at the time of this writing not yet generalized the (algorithmically much less 
uniform) handling of scale factor intervals to the GPU for different geometries. First computations 
in the plate-plate geometry show a much better convergence than a full Monte-Carlo integration. 

As language for the implementation of the algorithm the authors have used OpenCL in combi- 
nation with PyOpenCL|22]. This helps code re-use across a broad spectrum of multi- or manycore 
architectures. 

2.9 Generalization to Electrodynamics? 

While Worldline Numerics has drawbacks in comparison to other computational approaches, such as 
e.g. its inability to easily incorporate frequency-dependent optical properties of immersion media, 
it also has, from an applications perspective, some potentially very attractive characteristics (as 
we have reasoned out in this article). The biggest present obstacle to the utilization of Worldline 
Numerics for engineering applications is that it has not yet been generalized from (massless and 
massive) scalar fields to photons interacting with conductors. As we demonstrate in section|3j trying 
to use scalars to approximate the behaviour of photons can easily give qualitatively wrong results - 
so, the need to be able to handle photons is quite pressing from an application perspective. 

While worldline methods can in principle be adopted to dealing with vector bosons, as has been 
explored e.g. in [23| for gluon loop radiative corrections, the challenge is in properly modeling 
the boundary conditions for photon-conductor interaction. Considering the structure of the theory, 
one would at least expect that it should be achievable to couple the photon to a charged scalar 
undergoing spontaneous symmetry breaking, giving it an effective mass inside each object. This 
would then amount to modeling electromagnetic Casimir forces in the presence of superconductors. 
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Figure 3: Casimir forces in the piston geometry as a function of barrier height, calculated for scalars and photons using the 
analytic expressions given in [24]. 

Rather than trying to construct a generahzation of Worldhne Numerics for photons and conduc- 
tors starting from quantum electrodynamics principles, we for now approach this problem in a more 
adventurous way, if only to generate ideas: here, our guiding question is what the simplest conceiv- 
able generalization of Worldline Numerics might be that could possibly stand a chance of modeling 
electrodynamics. Naturally, the subsequent discussion in this section will be of highly speculative 
nature. 

Obviously, such a generalization will involve having photons propagate in loops. As we are not 
particularly interested in manifest Lorentz invariance here, we may just as well straightaway choose to 
gauge away the non-dynamic timelike component of the electromagnetic vector potential ^o, leaving 
us with a three-dimensional vector describing the photon polarization state. This is, of course, still 
redundant, as the photon only has two (transversal) physical states rather than three, but it might 
be conceivable to ultimately end up with a formulation in which, after evaluating the integrals over 
loop shapes and sizes, it turns out that the boundaries do not interact with (i.e. never see) the 
longitudinal photons. In such a case, they would drop out from all Casimir forces. 

It makes sense to assume that, as in the scalar case, bodies can be modeled as hollow thin shells; if 
we accept the perfect conductor approximation (which in particular states that characteristic length 
scales from the geometry are large in comparison to characteristic atomic distances), we would not 
expect to be able to probe the inner structure of conductors by looking at Casimir forces between 
them. So, we need to only concern ourselves with what happens when a photon loop pierces a 
conductor surface. The boundary conditions of a perfect conductor (n x .E = - no electric field 
parallel to surface, as this would immediately be compensated by charges shifting accordingly, and 
n • B = - no magnetic field perpendicular to surface, as this would be compensated by eddy 
currents, at least for frequencies cu sufficiently large for the perfect conductor approximation to 
hold) have to be implemented in some way. We want to consider all possible (including un-physical 
longitudinal) photon polarizations simultaneously, and hence should associate to every loop edge a 
3x3 matrix acting on the polarization state. In the end, the contribution to the Casimir energy from 
the loop under consideration should be taken to be the trace of the product of all these matrices; 
free propagation will amount to the identity, and the matrix corresponding to an edge that pierces a 
conductor surface would involve a projection eliminating some polarizations. What form may such 
a projection matrix have? As we already accounted for photon polarization directions, the only 
directions available are the surface normal n, as well as the local velocity y. The E = dtA ~ uA 
condition only involves perpendicularity to n, and while the B-condition would be expected to pick 
up spatial components of y, re-scaling these in order to form a projector seems to also bring us to 
a. A J- n condition, which means that one should take as projector that restricts to forbidden states 
the matrix I — n®n. Evidently, in the case of parallel plates, the trace would just introduce a factor 
two relative to the scalar field, as desired. 
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Figure 4: Compai isoii of predicted forces in the Casimir Piston geometry: The graph shows the ratio of the Casimir forces 
computed with our method to the exact, analytically computed, values 1241 . see Figurels] This ratio is plotted for computations 
based on scalar fields (upper curve) and photons (lower curve). A constant ratio would indicate perfect coincidence of the 
methods; note that constant scaling factors (e.g., due to fundamental constants of physics) have not been taken into account in 
our computations, as they were only carried out to test the functional dependency. 



The prescription described above seems, at the superficial level, quite obviously wrong - this 
is most easily seen by considering a loop close to a box corner made up of different objects: if it 
pierces three mutually perpendicular faces, it would contribute three forbidden polarizations, where 
the physical photon only has two. Still, it is valid to ask whether this loop processing prescription 
is related to some idealization of boundaries in a quantum field theory of vectors - and ask what it 
might look like. 

One would, for the reason given above, naturally expect such a prescription to give energies that 
do not have much in common with photon Casimir energies. An attractive nontrivial test geometry 
for which scalar and photon Casimir forces are known exactly, where discrepancies could reasonably 
be expected to manifest, and in which this scheme is easily implemented, is the piston geometry 
studied in [21]. If we plot the quotient of the (not yet properly normalized) energies predicted by 
Worldline Numerics and the analytic result for the scalar field, we get curve 1 in figure [4] If we 
compute the same ratio for our aventurous "Vector Worldline" generalization and the known result 
for photons, we get curve 2. 

We consider a coordinate aligned closed box with square cross section of side length b = c — 1 
and height height b - here, we choose h — 100. This box is sub-divided by a coordinate-aligned 
box-shaped movable barrier of identical cross section and infinitely small thickness; the distance 
between barrier and base of the box is taken to lie in the interval [0; 1]. 

The spatial sampling of loop positions was done on a Cartesian grid with lattice spacing 0.05, 
covering a coordinate-parallel cube of edge length 3 centered at the point 0.5 above the center of the 
box base plate. Forces have been calculated for barrier heights from 0.3 to 0.8 in steps of 0.1, using 
a central difi'erence quotient with 5h — 0.1. 

Since the barrier is taken as being as wide as the box, there is no positive minimum to the 
contributing loop size. However, for loops much smaller than the distance between barrier and 
base of the box the local geometry on either side of the barrier symmetric. So the contribution 
of sufficiently small loops cancels out. In order to avoid infinities, the loop scaling factor s was 



constrained to the range [0.02; 6] and sampled as described in Section 2.3 



In this system, one must keep in mind that, for the distances involved, Casimir energies, both in 
the scalar and photon case, vary by more than three orders of magnitude - as shown in Figure |3] 
While the ratio clearly shows some drift for the photon case in this experimental calculation, it cannot 
yet be excluded that this may be a discretization related artefact. Given that the simplistic ansatz 
proposed here would be expected to quite clearly show up as being deeply wrong (considering the 
ratios involved), this outcome is somewhat remarkable. At the very least, it seems to make sense to 
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Fi gUre 5; The "cylinders with sidewalls" geometry and the dependency of the attractive scalar Casimir force between the 
cylinders on the Position of the plates. At position 1.0, the plates would touch the cylinders. 



try and explore a small number of other geometries to learn whether this is a curious coincidence for 
this particular geometry (as one would expect), whether some not yet understood crazy cancellations 
may actually make this method work as a model for photons (unlikely), or whether it may turn out as 
being somewhat useful pragmatically in the sense of a heuristic engineering method that is known to 
be mathematically flawed (as many such engineering heuristics are) , but manages to give a reasonably 
good estimate for some applications. 



3 Parallel Cylinders between Plates, revisited 

The "cylinders with sidewalls" geometry studied in [21^ (Figure |5j on the left) has been shown to 
nicely demonstrate that Casimir forces are essentially multi-body forces. It is given by two parallel, 
infinitely long cylinders between two parallel infinite plates. Focusing on the attractive force between 
cylinders, one finds that this depends in a fairly subtle way on the distance between the plates. 

For the calculation whose results are shown in figure [S] the cylinders used have radius 1.0 and 
centres at {x, j/)-coordinates (-2.0, 0.0) and (2.0, 0.0), respectively. The plates are given by the 
equation z = p, where p is varied in the range 1.02. . . 2.50. The calculation of the force on the 
left cylinder used the scaling method, sampling around a grid with spacing 0.05 and exploiting the 



mirror method (Section 2.71 by taking a reflection-symmetric loop ensemble w.r.t. the plane x=- 
2.0 to r educ e the variance due to cancellation. This sampling was done in an adaptive manner 
(Section 2.61. In total, between 1.9 ■ 10^ and 3.9 ■ 10^ loops of 2^^ points were sampled for each 



geometry. 

The results are shown at the right-hand side of Figure [5] As opposed to methods such as the 
proximity force approximation, we do see a dependency of the forces on the cylinders on the plate 
distance. We however could not flnd the non-monotonic behaviour reported in the literature |21| . 
So, once again, we have an example where scalars behave in a qualitatively different way than 
photons p4| . 



4 Conclusions 

From a microsystems engineering perspective, the Worldline Numerics / Loop Cloud Method has 
a number of attractive properties, such as the ability to quickly give crude estimates, considerable 
potential to solve geometric optimization related problems, and of course its conceptual simplicity 
and intuitiveness that make it a useful educational tool with the potential to give a simple yet 
quantitatively correct mental model of the origin of Casimir forces. Quite remarkably, the operational 
procedure can be explained using very simple concepts only - in fact, even without having to use 
much linear algebra. 

At present, the biggest obstacle to its utilization for engineering applications is the method's 
inability to handle photons in the presence of conducting boundaries instead of scalar particles in 
the presence of Dirichlet boundary conditions. As we have demonstrated in section [S] through an 
example calculation, the problem is that any attempt to use scalars in order to approximate photon 
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Casimir forces is questionable as this can easily give predictions that are wrong already at the 
qualitative level. 

In addition to this calculation, and to discussing some methodological improvements of the Loop 
Cloud Method, this article provided some early speculation on whether a simple extension of the 
method may be able to give useful estimates for photon Casimir forces; despite obvious conceptual 
issues, the accompanying computation turned out to produce numbers that match the known exact 
result remarkably well, in particular considering that Casimir forces range over a few orders of 
magnitude for the problem studied. Still, one naturally would next try to refute the viability of this 
simplistic approach by checking its predictions for another not too trivial geometry, before starting 
any attempt to formally prove its validity. 
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